20:00 {janitor}Oftentimes, capitilization and spacing in column names is a nightmare when bringing in data.
cutoff 20%FruitPerDayAnd it is hard to consistently call things. We need to use backticks when we refer to variables.
What janitor::clean_names() is transform your variable names into consistent capitalization and spacing.
cutoff_20fruit_per_dayIt’s usually one of the first functions that I use when I load data.
clean_namesbefore
After
A standard of organizing data ready for analysis: Tidy data.
Many tools in R, such as plotting, are most effective when your data is in a Tidy format.
Easier to understand by looking at common un-Tidy data situations.
The solution for this is the {tidyr} package!
{tidyr}tidyversepivot_longer()pivot_wider()separate()drop_na()Quarter),Sales)pivot_longer()Solution: make the data into a longer format.
pivot_longer()Solution: make the data into a longer format.
pivot_longer()This is a common transformation, as it is easier to do data entry via a wider format, but the tools we use in programming often requires it in a longer format.
pivot_longer()gene contains both mutations and expression, and value contains both gene expression and mutational status.pivot_wider()Solution: make the data into a wider format. Tidyexplain
.
pivot_wider()Solution: make the data into a wider format. Tidyexplain
pivot_wider()tidy2 first.tidy2.
x to KRAS_mutationy to KRAS_expressioncolor to KRAS_mutationpivot_longer() can undo what we did in pivot_wider(), and vice versa.
separate()Solution: Let’s separate it:
We have looked at clear cases of when a dataset is Tidy. In reality, the Tidy state depends on what we call variables and observations for the analysis we want to conduct.
Tools such as ggplot require precise definition of our variables, so planning ahead what we want to use with our tools creates clarity of what we call variables and observations.
Tip: think about what you want to do with the data, and work backwards. That will help you identify whether the data is tidy or not.
What analysis would you want to do with this dataset, and what kind of transformation would you do to get it Tidy?